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Let X\ , X2 , ■ ■ ■ , X n be a sequence of independent or locally dependent random variables tak- 
ing values in Z+. In this paper, we derive sharp bounds, via a new probabilistic method, for 
the total variation distance between the distribution of the sum X]"=i ^ i an< ^ an appropriate 
Poisson or compound Poisson distribution. These bounds include a factor which depends on the 
smoothness of the approximating Poisson or compound Poisson distribution. This "smoothness 
factor" is of order 0(ct -2 ), according to a heuristic argument, where a 2 denotes the variance 
of the approximating distribution. In this way, we offer sharp error estimates for a large range 
of values of the parameters. Finally, specific examples concerning appearances of rare runs in 
sequences of Bernoulli trials are presented by way of illustration. 

Keywords: compound Poisson approximation; coupling inequality; law of small numbers; 
locally dependent random variables; Poisson approximation; rate of convergence; total 
variation distance; Zolotarev's ideal metric of order 2 

1. Introduction and overview 

Let X\ , X2 , . . . , X n be a sequence of independent or locally dependent random variables 
which take values in Z + . If X\ , X2 , . . . , X n rarely differ from zero (that is, P(Xi ^ 0) ~ 0) , 
then it is well known that the distribution of their sum can be efficiently approximated 
by an appropriate Poisson or compound Poisson distribution. This situation appears in 
a great number of applications involving locally dependent and rare events, such as risk 
theory, extreme value theory, reliability theory, run and scan statistics, graph theory and 
biomolecular sequence analysis. 

The main method used so far for establishing effective Poisson or compound Pois- 
son approximation results in the case of independent or dependent random variables is 
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the much acclaimed Stein-Chen method (see, for example, Barbour, Hoist and Janson 
(1992), Barbour and Chryssaphinou (2001), Barbour and Chen (2005) and the refer- 
ences therein). Another method for independent random variables is Kerstan's method 
(see Roos (2003) and the references therein). 

In the recent years, an alternative methodology has been developed in a series of papers 
concerning compound Poisson approximation for sums or processes of dependent random 
variables, employing probabilistic techniques, that is, properties of certain probability 
metrics, stochastic orders and coupling techniques (see Boutsikas and Koutras (2000, 
2001), Boutsikas and Vaggelatou (2002), Boutsikas (2006)). In this series of papers, the 
error estimates are, under analogous assumptions, of almost the same nature and the same 
order as the error estimates developed by the Stein-Chen method. The main shortcoming 
of these bounds, though, is that they do not incorporate any so-called "magic factor" 
(however, in the process approximation case treated in Boutsikas (2006), such a factor 
cannot be present). This factor, also known as a Stein factor, appears in approximation 
error estimates obtained through the Stein-Chen method and decreases as the parameter 
of the Poisson distribution increases. 

The purpose of this work is to derive sharp error bounds for the total variation distance 
between the distribution of the sum of integer- valued random variables and an appropri- 
ate Poisson or compound Poisson distribution. Specifically, by assuming that the random 
variables X\,X2, ■ ■ ■ ,X n are locally dependent (in the strict sense of fc-dependence), we 
derive bounds similar in nature to those obtained by the Stein-Chen method that include 
a factor analogous to a Stein factor. This factor is better/smaller than the associated Stein 
factors, thereby offering (for a large range of the values of the parameters) sharper bounds 
than relative ones derived via the Stein-Chen method. This factor is just the Li-norm, 
||A 2 /||i, of the second difference of the probability distribution function / of the approx- 
imating Poisson or compound Poisson distribution. It decreases as / becomes smoother, 
which, in our case, usually happens when the variance of the distribution corresponding 
to / increases. Hence, we shall often refer to this factor as the smoothness factor. The 
methodology we employ is based on a modification of Lindeberg's method, along with 
the coupling inequality of Lemma 4 and the smoothing inequality (which produces the 
aforementioned smoothness factor) of Lemma 1. 

It is worth pointing out an undesired effect of our treatment, which is an additional 
term in the proposed bounds that does not appear in Stein-Chen bounds. This term 
becomes large for a certain range of values of the parameters, but, as we explain in 
Remark 3 of Section 3, it can be substantially reduced if we possess a simple and effective 
upper bound for ||A 2 /||i. Nevertheless, this term is generally negligible, especially for 
small or moderate values of A, where A is the parameter of the approximating Poisson 
distribution. 

It is worth stressing that the error estimates presented in this work have the same 
optimal order as other bounds obtained through the Stein-Chen method. In fact, bounds 
derived using the latter method contain an additional log A term or, worse, an e A term 
for certain ranges of the parameters (see Barbour, Chen and Loh (1992), Barbour and 
Utev (1999), Barbour and Xia (2000) or Barbour and Chryssaphinou (2001) and the 
references therein). On the other hand, our bounds do not include such terms and they 
incorporate a better and more natural factor which we conjecture to be optimal. 
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The paper is organized as follows. In Section 2, we present some already known, as 
well as new, auxiliary lemmas which concern probability metrics and coupling techniques. 
These lemmas will be used for the derivation of our main results. In Sections 3 and 4, 
we present our main results, that is, bounds concerning Poisson and compound Poisson 
approximation for sums of independent and k- dependent random variables, respectively. 
Finally, in Section 5, in order to illustrate the applicability and effectiveness of our main 
results, we present a simple example of an application which concerns the distribution of 
appearances of rare runs in sequences of independent and identically distributed (i.i.d.) 
trials. 



2. Preliminary results 

Throughout this paper, the abbreviations c.d.f. and p.d.f. will stand for the cumulative 
distribution function and probability density function, respectively. In addition, CX or 
C(X) will denote the distribution of a random variable X and the notation X ~ G 
will imply that X follows the distribution G. Moreover, we shall write Po(X) to denote 
the Poissson distribution with mean A and CP(X,F) to denote the compound Poisson 
distribution with Poisson parameter A and compounding distribution F. In other words, 
CP(X,F) is the distribution of the random sum 

X)-Li Zi, where N ~ Po{X) and Z 4 arc 
i.i.d. random variables with c.d.f. F which are also independent of N . For two functions 
/ and g, the following standard notation will be used: 

fit) ~ git) ast^to if lim = 1; fit) = O(git)) if is bounded. 

t->i g{t) g{t) 

Moreover, whenever dependence or independence of some random variables is mentioned, 
it will be immediately assumed that they are defined on the same probability space. 
Finally, [^J denotes the integer part of x and wc will assume that a %i = when 
a > b. 



2.1. Probability metrics and smoothness factors 

In order to quantify the quality of a distribution approximation, the total variation 
distance and Zolotarev's ideal metric of order 2 will be used. Since the results of this 
paper concern discrete distributions, it suffices to consider only the discrete versions of 
the aforementioned probability metrics. 

The total variation distance between the distributions CX and CY of two random 
variables X and Y is defined by 
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whereas the total variation distance of order 2 or Zolotarev's ideal metric of order 2 
(Zolotarcv (1983)) is defined by 



/oo 00 
\E(X-t)+-E(Y-t)+\dt= ]T 
- OO I 



J2(F x (u)-F Y (u)) 



where, as usual, Fx denotes the c.d.f. of the random variable X . Throughout, whenever 
a ^{CX.CY) distance appears, it will be implicitly assumed that X,Y possess finite 
first and second moments and that E(X) = E{Y). For a comprehensive exposition on 
probability metrics and their properties, the interested reader may consult Rachcv (1991) 
and the references therein. 

Next, we denote by A k f the fcth order (backward) difference operator over a function 
/:Z->R, that is, Af(i) = f(i)-f(i-l) and A k = A(A k -\f), k = 1, 2, . . . (A / = /). 
The smoothness factor mentioned in the Introduction emerges from the following lemma. 
Analogous results concerning random variables with a Lebesgue density have been used 
in the past in order to obtain Berry-Esseen-type results (see Scnatov (1980), Rachev 
(1991) and the references therein). 

Lemma 1. If X,Y,Z are integer-valued random variables (with finite first and second 
moments) such that E(X) = E{Y) and Z is independent of X,Y, then 

d TV (C(X + Z),jC(Y + Z)) < ||| A 2 /z||iC2(£X,£F), 

where f z is the p.d.f. of Z and ||A 2 / Z ||i := X) zG z l^ 2 /^( z )l- 

Proof. For any functions a, b:Z— >M. and c,d€l, we have (second-order Abel summa- 
tion formula) 

d d 

^b z ^ 2 A 2 a z = ^a z A 2 b z + b d -iAa d - a d Ab d + a c -i A6 c _i - 6 c _ 2 Aa c _i. (1) 

z—c Z—C 

Denote by fw the p.d.f. of any discrete random variable W. If, for fixed fc, we now choose 

z+l 

a z = fz(z), b z = (Rx(k-i)-R Y (k-i)), 

i— — oo 

where Rx(k — z) = X^=-oo fx(k — *)> ano - then take c — > — oo, d — > oo, identity (1) leads 
to 

OO Z—l OO 

E (Rx(k- i )-R Y (k- l ))A 2 f z (z)= (fx(k-z)-f Y (k-z))f z (z) (2) 
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since all quantities b z , a z , Aa z , Ab z vanish as z — > oo or z — > — oo. Using (2), we get 



d TV (£(X + Z),£(Y + Z)) = - 

k— — oc 
1 oo 

= 5 E 



k— — oo 

oo 



£ (f x (k~z)-f Y (k-z))f z (z) 

— — OO 
OO 2 — 1 

A 2 / Z (z) ^ (R x (k-i)-R Y (k-i)) 



^ oo +oo 

4 E i a2 /-wi £ 



z — — oo 



5] {Rx(k-i)-R Y (k-i)) 



Finally, setting s := fc — z + 1 and u:=k — i in the second and third summation above 
yields 



oo oo 

d T v(/:(x + z),£(y + z))<- £ |A 2 /^)| 



Z — — OQ 



J2(Rx(u)-Ry(u)) 



-\\A 2 f z U 2 (X,Y). 



□ 



If Z = and E(X) — E(Y), then a simple consequence of the above result is the 
inequality 



d TV (cx, cy) < 1 1| a Vo II 1C2 £y ) = 2C2 £y), 



(3) 



where /o := /z when Z = 0. If Z follows a Poisson distribution with parameter A, then 
we can find the explicit value of ||A 2 /z||i and its asymptotic behavior. In the sequel, we 
shall write fp (\) instead of fz when Z ~ Po(X). As we will see below, it is convenient to 
first find the Loo-norm, || Afp a r\\ Hoc, and then to investigate its relation with the norm 
I|A 2 /p o( a)||i. 



Lemma 2. If fp (\) denotes the probability distribution function of the Poisson distri- 
bution with parameter X, then 



I A /po(A)IIoo = SU P \fpo(\)(k)- fp (\)(k-l)\=c- x —[ 1 - ^ 



A 



where k\ := [X — \JX + 1/4 + 1/2J for all X > 0. In particular, ||A/p (a)||oc = e A f or 
X < 2 . Furthermore, 



\Af Po 



1 

Ay/2ttc 



as A 
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Proof. It can be easily verified that A/ Po ( A) (k) = c A ^-(l — ~), k G {0, 1, 2, . . .}, while 
Afp (X)(k) — for k < 0, and also that 

A 2 / Po(A) (fc) = e"*£ (l + - 2*\ fee {0, 1,2,.. .}, 



while A 2 / Po(A) (fc) = for k < 0. Define ft:IR + ->• K such that ft (a;) = A 2 / Po(A) (a;) (that 
is, the extension of A 2 /p Q ( A ) over K+), where x\ now denotes the Gamma function 
r(l + x). It is easy to verify that ft is positive when < x < p\, negative when 
Pi < x < p 2 and positive again when x > p 2 , where pi = pi(A) = A — yA + 1/4 +1/2 
and p2 = P2(A) = A + y/X + 1/4 + 1/2 are the two roots of the equation ft(x) = 
(0 < pi < P2). Since ft is an extension of A 2 /p G ( A ), we deduce that A 2 /p ( A )(/c) > when 
< k < pi, A 2 /p D ( A )(fc) < when pi < fc < p 2 and A 2 /p D ( A )(fc) > when fc > p 2 . This 
implies that = A/ Po(A) (-l) < A/ Po(A) (0) < • • • < A/ Po(A) (LpiJ), while A/p o(A) ([piJ) > 
A/p o(a) (LpiJ +l)>--->A/ Po(A) (Lp 2 J) and A/ Po(A) (Lp 2 j) < A/ Po(A) (Lp 2 J +1)<---. 
Hence, |A/p Q ( A )(fc)| must be maximized at [pij or [p 2 J (since A/p ( A j(fe) — > as k — > 00). 
In order to verify that it is maximized at fc A = [pi(A)J , we shall prove that gi(X) > g 2 {X) 
for all A > where 5l (A) = A|A/ Po(A) (Lpi(A)J)| and g 2 (X) = A|A/ Po(A) (Lp 2 (A)J)|, that is, 



A^(^)J / Lp 2 (A)J \ 

Lp 2 (A)j!v a 



A>0. 



For every k G {0, 1, . . .},£ <E [0, 1), we have [pi(k + e + \Zk + e)\ = [k + ej = k. Therefore, 
[pi(A)J = k for every A G [k + Vk, k + 1 + y/k + l). Hence, in this interval, the function 
<?i(A) is equal to Ae _A ^-(l — j), differentiable (except at k + Vk) and concave, and 
g[(X) = at A = a(k ) = k + 1/2 + y/k + l/l. Moreover, g x (X) -> gi(k + 1 + Vk+1) as 
A — > k + 1 + + 1 and thus gi(X) is continuous for every A > 0. Therefore, 51(A) > 
gi(k + V~k) for every A G [a(fc — 1), a(fc)], G {1,2,...}. Using the upper bound of Stirling's 
approximation (&;! < k k e~ k V2nkc 1 ^ 12k } ) and the elementary inequality log(l + x) > x — 
x 2 /2 + x 3 /3 - x 4 /4, x > 0, we get 



1- (h -\- ^/Pl fc „-v / fe-l/(12fc) ^ „l/(3Vfe)-l/(3fe) 

g x {k + Vk) = c-( fc +^) (^ + V fc ) > 2 e *u*(i+i/Vfc) > £ ' 



> 



fc! a/271 V^Tte \/2ttc 

for every k > 1. Therefore, 51(A) > ^== for every A G Ufe>i[ a (^' — l)i a (&)] = [l,oo). 

Similarly, for every fc G {1, 2, . . .},£ G [0, 1), we have [p2{k + e — \Zk + e)\ = [k + e\ = k. 
Therefore, [/°2(A)J = k for every A G [k — Vk, k + 1 — \fk~+l). Moreover, in this interval, 
the function g 2 {X) is equal to ^~ X ^r(j — 1), differentiable (except at k — Vk) and 
concave, and g' 2 (X) = at X = k + l/2— y/k + 1/4 (g 2 (X) is also continuous for every A > 
0). Therefore, g 2 (X) <g 2 {k + 1/2 - yjk + 1/4) for every A G [k - Vk, k + 1 - Vk+T], k G 
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{1, 2, . . .}. Using the lower bound of Stirling's approximation (fc! > k k e~ k \f2nk) and the 
elementary inequality log(l + x) < x — x 2 /2,x G (—1,0), we get, for k > 1, 



ff2 (fc + 1/2 - v/fc + i/4) < ^ + 1/4 1 /^-i/2+V^TT7i+fciog(i+(i/2- % ATT7i)/fc) 



Jk + TJl- 1/2 e (V*+V*-Va)/2fc j 
< t= - ^= < 



\/fc \/27Te \/27tc 

Therefore, (72(A) < y|= for every A > 0. Hence, 52(A) < < 31(A) for every A > 1. 
It now remains to show that (72(A) < (71(A) for every < A < 1. This is easily verified 
since 31(A) = Ae~ A for A < 2, while g 2 (X) = c~ A A(l - A) for A G [0, 2 - \/2) and 32(A) = 
c~ A A 2 (l-f) for AG [2 -a/2,3- a/3). 

Finally, from (1 + y) kx = e ^°&( 1 +v) = c k>.(y-y 2 /2+°(y 2 )) w j t h y=(\-k x )/k\, we get 

e fcx ~ A (A/fc A ) fe * -> c~ 1/2 as A -+ 00. 
From this fact and Stirling's formula, we get, as A — > 00, that 

In the next lemma, we find the explicit value of ||A /p (a)||i and a convenient upper 
bound in terms of ||A/p (A)||oo- 

Lemma 3. If fp (\) denotes the p.d.f. of the Poisson distribution with parameter A, 
then 

±5° /\kx-l 



|A 2 /p o(A) || 1 =^|A 2 / Po(A) (z)H2e- 



^ .IPo(\)\*) \ = ^e ~ } 
z=0 



A A (A — k\) _ X^-^X-ux) 



where k x := LA - ^X + l/4+1/2] and u\ := [X + V /A + 1/4 + 1/2J . Moreover, 
||A 2 /p (A)lli<4||A/p o(A) || 00 and || A 2 / Po(A) || l ~ — L= as A ^ oo. 

A\/27tC 



Proof. For convenience, we set fc A := [pij and it A := L/O2J , where pi := A — \JX + 1/4 + 
1/2 and ^2 := A+ yA + 1/4 + 1/2, and g{z) := A/p D ( A )(z). In the proof of Lemma 2, we 
have seen that = g(—l) < .9(0) < ■ ■ • < g(k\), while g{k\) > g{k\ + 1) > ■ ■ • > g(u\) and 
g(u\) < .9( u a + !)<•••• We then have 



+00 k\ u\ +00 

iia 2 /p 0( a)||i=Ei a ^)i=E a ^)- E A 5W+ E A -9( z ) 

z=0 z=0 z=k\ + l z=«x+l 



308 M. V. Boutsikas and E. Vaggelatou 

= (g(k x ) - «?(-l)) - (g(u x ) - g(k x )) + (0 - g(u x )) 
= 2{g{k x )-g{u x )). 

From the proof of Lemma 2, we also get that 

g(k\) = A/ Po(A) (fc A ) = max A/ Po(A) (z) = IjA/po^H^; g(u x ) = min A/ Po(A) (z) < 

and g(k x ) > -g(u x ). Therefore, we obtain that ||A 2 /p o(A )||i < 4g(fc A ) = 4|| A/ Po(A) || 00 . 
The last asymptotic result follows from the fact that A/p ( A )(it A ) ~ — A _1 (27tc) -1 / 2 , 
which can be proven in exactly the same way as A/p Q ( A )(fc A ) ~ A _1 (27Tc) -1 / 2 was proven 
in Lemma 2. □ 

A crude but simple upper bound is ||A 2 /p ( A )||i < 4 1- ° A 3A < 4(1 A j^) for all A > 0, 
whereas || A/ Po(A) ||oo < 1/(3A) for A > 2. 

Remark 1. For distributions other than Poisson, it is not always easy to derive an 
analytic expression for ||A/||oo or || A 2 /||i. Nevertheless, it is always feasible to compute 
the numeric value of these norms employing numerical or symbolic mathematics software 
packages (for example, Mathematica, Maple or MATLAB). 

An approximate expression for these norms can be easily derived if we assume that the 
distribution corresponding to the p.d.f. / can be approximated by a normal distribution 
N(/i,a 2 ), for example, due to CLT. In this case, we expect that HA/Hoo and ||A 2 /||i 
would be close to H/^^IU and Wf^^h, respectively, where /^ )0 .2) denotes the 
kth order derivative of the p.d.f. of A(/i,cr 2 ). It is not difficult to verify that, for the 
normal distribution, we have 

ll/wiv^lL =sup|/^ i(72) (a;)| 



j£i ^' ' <r z \'2nc 

/+oo 
l/7V(^,cr 2 )( X )l ^ = 411/^(^^2)1100- 
-oo 

Hence, for distributions similar to the normal with variance cr 2 , we expect ||A 2 /||i to be 
nearly equal to 4cr _2 (27te)" 1 / 2 . This approximation works for the Poisson distribution 
(as seen in Proposition 3 above) since, for large A, it is close to a normal distribution 
with cr 2 = A. According to the above, concerning the compound Poisson distribution, if 
CP{\ F) w N(XE(W), \E(W 2 )) (with W ~ F) then we can expect that, for large A, 

i|a2/cp(a - f)IK "a^v1^- (4) 

It is worth stressing that (4) is valid provided the compounding distribution F is 
such that CP(X,F) is approximately normal. There exist counterexamples showing that 
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(4) is not always valid; see Example 1.3 or 1.4 of Barbour and Utev (1999). Specifi- 
cally, the CP(X,F) described there cannot be approximated by a normal distribution 
and, moreover, it can be verified that the corresponding || A 2 f CP ( Xt F \ || i does not de- 
crease as A increases. Note that Barbour and Utev (1999) use these counterexamples 
to show that, even for independent Xj's (with pi = P(Xj ^ 0), A = £pi), we cannot 
always prove that dTv(£(£JQ), CP(X,F)) = 0(X~ 1 T,p 2 ) and sometimes (depending on 
F) the order 0(Ep 2 ) is optimal. Theorem 9 below implies that this oItv is of order 
0(X~ 1 Y,p 2 ) whenever F is such that || f cp{\,f) II l = 0(A _1 ) (see also Remark 2 in 
Section 3). 

2.2. Coupling techniques 

A coupling of two random vectors X, Y <G M. k (to be more exact, of their distributions 
£X, CY) is considered to be any random vector (X', Y') defined over a probability space 
(£1,3, P) and taking values in a measurable space (R 2fe ,£(R 2fe )) with the same marginal 
distributions as X, Y, that is, £X = £X'and CY = CY' . Loosely speaking, a coupling 
of X,Y is any "definition" of X,Y in the same probability space. This definition of 
coupling can be generalized for n random vectors in an obvious way. A well-known result 
concerning the g?tv is the so-called (basic) coupling inequality, 

d TV (CX,CY) <P(XVY'), 

which is valid for any coupling (X',Y') of two random vectors X,Y. It can be proven 
that we can always construct a coupling (X', Y') of (X, Y) such that c?tv('CX,£Y) = 
P(X' Y') (for example, see Lindvall (1992), page 18). Such a coupling is called a 
maximal coupling or ^-coupling of X, Y. All of the above could be expressed cquivalently 
for probability measures as follows: if Pi , P-i are two probability measures on (K fe , B(M. k )) , 
then any probability measure P on (M 2fe ,B(R 2k )) with P(A x M fc ) = P x (A) , P(R k x A) = 
P2(A) for every A E B(R k ) is called a coupling of Pi, P2. Moreover, it can be proven that 
there exists a coupling P 7 of Pi,P2, called a maximal coupling or ^-coupling, such that 

d T v(Pi,P 2 ) = 1 -P 7 ({(x,x),xeR fc }). (5) 

Obviously, all of the above can be adapted in the obvious way for random vectors taking 
values in Z fe and to multivariate distributions over the probability space (Z fe , 2 Z ) . 

The following lemmas will play a crucial role for the establishment of our main results. 
The first inequality of the following lemma is Corollary 4 in Boutsikas (2006). The second 
inequality of the following lemma is a direct application of Lemma 3 in Boutsikas (2006) 
with (S^, %) = (Z + X, Z + Y, W + X, W + Y). 

Lemma 4. For any random vectors X, Y 6 M. k and Z,W € K r defined on the same 
probability space, we have that 

(a) |dTv(£(Z,X),£(Z,Y))-d T v(£(W J X) ) £(W ) Y))|<2P(X^Y J Z^W); 
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(b) |d TV (£(Z + X), £(Z + Y)) - d TV (C(W + X), £(W + Y))| < 2P(X ^ Y, Z ^ W). 



The next inequality follows from the above lemma. It is remarkable that almost the 
same inequality can be found in Rachev (1991), page 274, and has been applied to 
derive Berry-Esseen-type results. We present an entirely different proof using maximal 
couplings. 

Lemma 5. If the random vectors X,YgR fe are independent o/Z,WsR r , then 

\d TV (C(Z + X), £(Z + Y)) - d TV (C(W + X),£(W + Y))| 
< 2d TV (£X,£Y)d T v(£Z,£W). 

Proof. Let (X*,Y*) be a maximal coupling of £X, £Y and let (Z*,W*) be a max- 
imal coupling of £Z, £W. Next, let ((X',Y'), (Z', W')) be an independent coupling 
of £(X*, Y*), £(Z*, W*) (that is, (X', Y') is independent of (Z', W) and £(X', Y) = 
£(X*,Y*), £(Z',W') = £(Z*,W*)). Applying Lemma 4, we get 

|d T v(£(Z' + X'),£(Z' + Y)) - d TV (£(W + X'),£(W + Y))| 
< 2P(X' ^ Y, Z' ^ W) = 2P(X' ^ Y')P(Z' ^ W) 
= 2P(X* ^ Y*)P(Z* 56 W*) = 2d T v(£X,£Y)d TV (£Z, £W). 

The obvious fact that £(Z' + X') = £(Z + X), £(Z' + Y') = £(Z + Y), £(W + X') = 
£(W + X) and £(W + Y) = £(W + Y) completes the proof. □ 

A direct application of the previous result leads to the following inequality which is 
valid for any random variables X,Y €M. independent of another random variable W £ R. 
Specifically, if we simply set Z = in Lemma 5 and exploit the fact that dTv(£0,£W) = 
P(W^0), we derive 

d T v(£A, £Y) < 2d TV (CX, CY)P{W ^ 0) + d TV (C{X + W), £(Y + W)) 

which, for P(W ^ 0) < 1/2, implies that 

d TV (CX,CY)< (l-2P(W^0)y 1 d T v{C(X + W),C{Y + W)). (6) 

The next lemma can be considered as a coupling inequality concerning £2, analogous 
to Lemma 4. 

Lemma 6. // X, Y, Z, W are real-valued, non-negative random variables defined on the 
same probability space with finite second moments and E{X) =E{Y), then 



MC{X + Z), C(Y + Z)) - UC(X + W) 7 C(Y + W))\< E\(X - Y)(Z - W)\. 



(7) 
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Proof. The distances C2 appearing in (7) are well defined since the random variables 
X + Z, Y + Z, X + W and Y + W have finite second moments due to Minkowski's 
inequality and E(X + Z) = E(Y + Z), E(X + W) = E(Y + W). Set l [o < 6] := 1 if a < b 
and l[ a <b] '■= otherwise. As usual, Fy denotes the c.d.f. of a random variable V. Recall 
that, for" X,Y e R+ with E{X) = E(Y), we have 

poo pOO pOO 

( 2 (£X,£Y)= \E(X-s) + -E(Y-s)+\ds = / (F x (x) - F Y (x))dx 

JO JO Js 

Denoting by d the absolute difference in the left-hand side of (7), we have 



d = 



< 



{F x+Z {x)-F Y+Z {x))dx 
{F x+Z (x)-F Y+Z (x))dx 



ds- 



(F x +w{x) - F Y+ w(x))dx 



(F x+W (x) - F Y+W (x))dx 



ds. 



Using the inequality ||a| 



d< 



< \a — b\,a, b£ R, we get 



(F x+Z {x)-F Y+Z {x))dx 

pOO / pOO 

\E(C s )\ds<EU \C s \d 



(F x+W (x) - F Y+W (x)) dx 



ds 



where 



(l[X+X<x] - ~L[Y+Z<x])dx - J (l[X+W<x] - l[Y+w<x])dx. 
Now, if Z > W, it can be verified that C s > for all s > and, therefore, 

/>oo />oo />oo 

/ \C s \ds= x(l[ X +Z<x] ~ 1[Y+Z<x]) dx - / x(l[ X+ W<x] - l[Y+W<x])dx 

Jo Jo Jo 



(\(X + Zf - (Y + Zf\ - \(X + Wf - (Y+ Wf\) 
(\{X-Y)(X + Y + 2Z)\ - \{X-Y)(X + Y + 2W)\) 



= \X-Y\(Z- W). 

On the other hand, if Z < W, then C s < for all s > and we similarly derive that 
J o °° \C S \ ds = \X-Y\(W- Z). Hence, /" \C S \ ds = \(X - Y){W - Z)\ and the proof is 
completed. □ 



A direct corollary of Lemma 6 is the following result which will be proven useful when 
dealing with fc-dependent sequences of random variables. 



312 M. V. Boutsikas and E. Vaggelatou 

Corollary 7. If the random variables X\,X%,. . .,Xi £ R+ are k-dependent with 
E(Xf) < oo and I < i — k + 1, then 

\ j=l \j=l / ) j=i-k+l 

where X^- is a random variable independent of all Xj, j = 1,2, with CXi = CX^~ . 

Proof. Set X afi := J2 j=a Xj ■ Applying Lemma 6 with X = X t ,Z = Y = X±, W = 

Xi^i-k, we obtain 

|C 2 (>CX M ,£(X i ,i_ 1 +X±)) - ( 2 (£(X lti . k +X i ),C(X lii - k 

< E\(Xi - Xj-XXt^! - X hi _ k )\ = E\(Xi - Xt){X l - k+1 ^ 1 )\ 

i-1 

< (E(X l X 1 ) + E(X l )E(X 1 )). 
j = i~k+i 

Since and Xi are independent, X^i-k and X 4 - are independent, and CXi =CX^~, 

we conclude that C(Xi yi ^k + Xi) = C(Xi^k + X^~) and hence we obtain the desired 
inequality. □ 



As will be seen in the next section, Lemmas 1 and 5 arc sufficient for proving compound 
Poisson approximation results for sums of independent random variables incorporating 
a smoothness factor. In the case of sums of dependent random variables, though, the 
following, additional, lemma is needed. The question addressed here is the following: 
given a random variable X and a random vector Z, can we construct (on the same 
probability space as X, Z) another random variable Y with a given p.d.f. / such that Y 
is independent of Z and (X, Z), (Y, Z) are maximally coupled? In this situation, we could 
loosely say that we wish to construct a random variable Y (with a given distribution) 
that resembles X as far as possible, while remaining independent o/Z. Again, it suffices 
to restrict our analysis to the discrete case. 

Lemma 8. Let X G Z, Z G Z fe be a random variable and a random vector, respectively 
(defined on the same probability space) and let /:Z— >R + be some given discrete p.d.f. 
Denote by U a random variable independent of X, Z that follows the uniform distribution 
on (0, 1). Then, 

(a) there exists a function g : R 2+fe — > R such that the random variable Y — g(U, X, Z) 
has p.d.f. f, Y is independent of Z and 

d TV (C(X, Z),£(Y, Z)) = P((X, Z) ± (Y, Z)) = P(X ji Y), 



in other words, (X, Z),(Y, Z) are maximally coupled; 
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(b) there exists a function g' : R 2 — > K such that the random variable Y' = g'(U,X) 
has p.d.f. f and (X, Y') are maximally coupled, that is, dTv(CX, CY') = P(X ^Y'). 

Proof, (a) Here, we develop a constructive proof. Denote by (CI, A, P) the probability 
space on which X, Z,U are defined and let fx\z{'\ z ) = fx,z{', z)//z(z) be the conditional 
p.d.f. of X given Z = z. Consider the probability measures Pi,P 2 on the measurable 
space (Z,2 Z ) generated by fx\z('\ z ) an d /, respectively. According to (5), there exists 
a maximal coupling of P^,P 2 . Denote by h z :Z 2 — > R + the joint p.d.f. corresponding to 
this maximal coupling. It follows that J2 x ezh z (x,y) = /(?/)' SyeZ h z (x,y) = fx\z(x\z) 
and 



d TV (C(X\Z = z),P 2 ) = d TV (P?,P 2 ) = 1 - £>z( 



x.x) 



We now construct Y as follows. For every x € Z,z S Z k , consider the c.d.f. 

■kv\ fxlz{x]z) 

and set Y(uj) := H X (u) z(cj)(^( w ))' w e ^' where H~l(y) denotes the generalized in- 
verse of H XiZ (y), that is, H~^(y) = inf{w: H XiZ (w) >y}. The function fx,Y,z(x,y,z) := 
h z (x,y)f z{z) is a multivariate discrete p.d.f. and it can be verified that Y and (X,Y,Z) 
have p.d.f. / and fx,Y,z, respectively. Indeed, 

P(X = x,Y <y,Z = z) = P(X = x,H~^{U)<y,Z = z) 
= H x . z (y)P(X = x,Z = z) 

and thus, for all x,y,z, 

P(X = x,Y = y,Z = z) = -^^Lp(X=x,Z = z) 

fx\z{x\z) 

= K(x,y)f z (z) = f x ,Y,z{x,y,z). 

Also, note that, for all x, z, 

P(Y = Z,,Z = z) = £ fx,Y,z{x, y,z) = Y J K (x, y)f z (z) = f(y)f z (z), 

X £Z X £7* 

which implies that Y is independent of Z. Furthermore, we derive that, for all z, 
P(X ^ Y |Z = z) = 1 - E M*, = ^tv(/:(A|Z = z),CY) 
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and, therefore, 



P{X^Y)=Y J P{X^Y\Z 



z)/z(z) 



zez fc 



= Y, dTv(C(X\Z 



z),£Y)/ z (z) 



z<EZ fc 



w|Z 



z) - P(Y = w) | /z(z) 



zGZ fc w£Z 



= 2 51 ^ |P(X = W ' Z = z) " P(r = w)/z(z)l 



zGZ fc iuG 




□ 



3. Compound Poisson approximation for sums of 
independent random variables 

Let X\,X2, ■ ■ ■ ,X n be a sequence of independent random variables which take values in 
Z + . We are now ready to exploit the results of the previous section (specifically Lemmas 
1 and 5) to derive a simple and, in most cases, sharp upper bound for the total variation 
distance between the distribution of the sum X)"=i an< ^ an appropriate compound 
Poisson distribution. Before we present this bound, we recall that, (see Boutsikas and 
Vaggelatou (2002)) 



with Pl := P(Xi ^ 0), A = Yn=iPi and G i( x ) = P ( X i < x \ x i + °)- Naturally, the bound 
of the following theorem is useful (that is, it tends to 0) when pi as 0. Hence, the condition 
Pi < log 2 as 0.693 imposed below does not affect the generality of the result. One could 
easily modify the upper bound (making it a little bit more complicated) so as to eliminate 
this restriction, but this modification would lead to no practical gain. 

Theorem 9. Let X\ 1 X2,- ■ ■ ,X n be a sequence of independent random variables (with 
finite second moments) taking values in Z + and P(Xi ^ 0) =:pi < log 2 (ss 0.693). Then, 
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where X = £?=if>i> F(x) = £'/ =1 f Gi{x) and G^x) = P(X t < x\X z /0),ieZ. 

Proof. Let Ni , N2, • • ■ , iVjj be independent random variables following the compound 
Poisson distribution with parameters (pi, G\), (P2, G2), ■ ■ ■ , (p n , G n ), respectively. We ap- 
ply the triangle inequality to get the following Lindeberg decomposition of the distance 
of interest, 

/ n n \ n / / m n \ / m — 1 n \ \ 

\ z—1 4—1 / m=l \ \ i=l i— m+1 / \ i—1 i—ra / / 

(9) 

Furthermore, if we set 



m— 1 



X m :=X m + J2 N i> Y m :=N m + £ N i; Z m := ^ X h W m := £ JV is 



z— m+l z— m+1 



then the random variables X m , Y m are independent of Z m , W m and a direct application 
of Lemma 5 to X m , Y m , Z m , W m reveals that 



^ '2cL rt ib m -\- c m -j 

\ \ i=l ?=m+l / \ i=l 2=m+l / / 

(10) 

where 

a m :=d TV lclx m + Ni),c(N m + nAY 

\ \ i=m+l / \ i=m+l / / 

(m— 1 m—1 \ 

i=\ i=l / 

Cn := d TV (^C N i ~ N m + , £ ■ 

Next, let iV^ be a random variable independent of all N,Xi with £iV^ = CN m . Applying 
inequality (6) with W = N^, we derive 

c™ < (1 - 2(1 - e^-Or^TV (-C ^ Ni + X m J , £ (X>> + iV„J ^ 
since P(N^ ^ 0) = 1 — c _Pm . Furthermore, Lemma 1 yields 

c ™ ^ i_2(l-e-M & CX ™> CN m) ~ 4( i_ 2( i_ c - Pm)) £ ( > ( n ) 
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where we have used (8) to get that Q 2 {CX m ,CN^) = ( 2 (£X mi CP( Pm ,G m )) = \E{X m ) 2 . 
On the other hand, we can easily bound the quantities a m ,b m as follows: 

m— 1 m—1 

a m <d TV {CX m ,CN m )<p 2 m and b m < ^ d TY {CX u CNi) = ^ pj. (12) 

i=l i=l 

Finally, combining (9)~(12), we get 

(n n \ n 

z— 1 i—1 / m—1 

" £ P" S * 4(1-2(1-0--))^ ) > 
which readily leads to the desired inequality since 

n n m—1 

E 2a ™ & ™ ^ E 2 p™ E 

m—1 rn—1 * — 1 

n m—1 n n / n \ ^ 

= E E ^ + E p 2 m E ^ ( E p 2 ™ • D 

m—1 i—1 m—1 i— m+1 \m— 1 / 

A straightforward corollary of the above theorem arises when we consider indepen- 
dent Bernoulli random variables. In this case, the distribution of the sum of the binary 
sequence X\ , X 2 , . . . , X n is also known as a Poisson binomial or generalized binomial 
distribution and the approximating compound Poisson distribution naturally reduces to 
an ordinary Poisson distribution. 

Corollary 10. Let Xi,X 2 , ■ ■ ■ ,X n be a sequence of independent Bernoulli random vari- 
ables with P{Xi = 1) = pi < log 2, i = 1, 2, . . . ,n. Then, 

5>,Po(A)J < (5> 2 J + -HAVp^lIrE l- 2 (f- e -P0 :=UBp » 
where A = ^iLiPi and || A 2 /p (^)|ji is given in Proposition 3. 

Remark 2. If we assume that 52<Li Pi — > as n — > oo (implying that max^p,; — > 0), the 
first term (^2™ =1 p 2 ) 2 in the upper bound UBp a (Corollary 10) or in UBcp (Theorem 9) 
tends to at a faster rate than the second term and, therefore, the order of UB p a and 
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UBcp is the same as the order of their second term. That is, for UBcp, we have 

n 

1/4|| A 2 /cp(a,f) Hi J2 E ( Xi ?> for A fixcd ' 

i=l 

1 ™ 

Ei?(X,) 2 , whenA^oo, 



C/5 



CP 




where fi2 denotes the second moment of the compounding distribution F (see Remark 

I above). According to Remark 1, the second asymptotic result for UB cp above (when 
A — > oo) is valid when CP(X,F) is close to a normal distribution. Therefore, we can say 
that, for independent X\,... ,X n £ Z + with E(X{) =0(pi), 

d T v \C\J2xij,CP{\,F 

whenever F is such that CP(X, F) ps N(fi, a 2 ) or, more generally, whenever || A 2 f C p(\ ,f) II l 
□ (A" 1 ). Our approach requires the restriction Y^i=\Pi ( n °t onr y m & x iPi —> 0),but 
we have reasons to believe (see Remark 3 ) that this restriction is superfluous and can 
be weakened. This offers a clue to a question raised by Le Cam (1960) (see also Barbour 
and Utev (1999) and Roos (2003)) about the form of the compounding distribution F 
that would permit us to achieve a compound Poisson approximation error order similar 
to that obtained for Poisson approximation, that is, \ Y^i=iPi- 

We also point out that the upper bound UBp a of Corollary 10 for the Poisson ap- 
proximation is similar to the one derived by Deheuvels and Pfeifer (1986), (see also De- 
heuvels, Pfeifer and Puri (1989)) who employed an entirely different method. The factor 

II A 2 /po(A) |j i/4 appears in the bounds of these articles (in an equivalent form, not recog- 
nized as being the Li-norm of A 2 /p ( A )/4) and was proven to be optimal (that is, dxv ~ 
UBp ; see Deheuvels and Pfeifer (1986)) under the usual asymptotic assumptions. The 

fcP(\,F)\ 



same argument is possibly true for the more general smoothness factor ||A fopix 



Remark 3. In the proof of Theorem 9, the quantity 2^^ =1 a m 6 m (see relation (12)) 
was bounded rather crudely in order to obtain a closed form upper bound. This resulted 
in a simple-in-form first term, namely (X^i? 5 ?) 2 ' m UBcp- If Y^i=iPi ~ * ^' then this 
term does not have a significant effect on UBcp, but if J27=iPi * s no ^ close to 0, then it 
may result in a very crude upper bound. 

Nevertheless, concerning the Poisson case, if we possessed a simplc-in-form upper 
bound for || A 2 /p ( A )||i, we could obtain a better (smaller) bound for the quantity 
2J2m=i a "ib m - To get an idea of how this can be done, we shall treat the simplest 
case where X±, X2, ■ . - ,X n are i.i.d. (pi =p) Bernoulli random variables. Recall that, in 
general, \\\ A 2 f Po{x) \\ 1 < (1 A A.) and, therefore, 

a m < 1/2||A 2 / S? _ N M2(X m ,N m ) < (lA 1 )p 2 , 
! - m+1 \ 3(n — m)p J 
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O m < —, TT > Pi < P- 

(m — l)p f— 

Assuming that A > 1/3 +p and taking into account that Yl7=ni 1 < l g( Tr-T )> ^ ne sum 
2J2m=i a mb m is bounded above by 

^-^ \ 6(n — m)p / z — ' 3(n — m) z — ' 

m=l v v >F / m=l v ; m=Ln-l/(3p)J+l 



(13) 



under the assumption p < 1/3. The latter reveals that, when pi = p and A > 1/3 +p, the 
term (J27=iPl) 2 = ^"P 2 m Corollary 10 can be substantially reduced to (13), implying 
that UBp « |p 2 (log3A + 1) + y^g P- The above bound could also be reduced (requiring 
more complicated algebraic manipulations) in the case of non-i.i.d. Bernoulli random 
variables. For a more general case though, for example, in a compound Poisson approx- 
imation, we must first find a suitable general upper bound for ||A fcp(x,F)\\i which, at 
the moment, does not seem an easy task and is left for future work. 



4. Compound Poisson approximation for sums of 
fc-dependent random variables 

In this section, a more general setup is assumed. We are now interested in approximating 
the distribution of the sum X\ + ■ ■ ■ + X n when the fc-dependent X^s are rarely non-zero. 
Naturally, we expect that this distribution converges weakly to an appropriate compound 
Poisson distribution. 

Following the same methodological steps as in the proof of Theorem 9, we offer a 
bound that includes a smoothness factor analogous to a Stein factor. The appearance of 
such a factor is perhaps the first (for sums of dependent random variables) outside the 
Stein-Chen method. As was mentioned in the Introduction, the smoothness factor we 
derive is simpler, seems more natural and is better than the corresponding Stein factors. 
On the other hand, inevitably, an undesired term analogous to (J^Pi) 2 °f Theorem 9 
again appears in the upper bounds. 

For convenience, we shall focus our approach on a sequence of independent random 
variables Z\, Z2, ■ ■ ■ defined over a probability space (ft, A, P) and consider A;-dependent 
random variables of the form hi(Zi, . . . , Zi + k-i)- This approach is not restrictive since, in 
almost all applications, local dependency arises in this setup (for example, runs or scan 
statistics, patterns, reliability theory, graph theory problems, moving sums, etcetera). 
Specifically, let Z\, Z2, ■ ■ ■ be independent random variables and also let 



Xi — hi(Zi, . . . , Zi + k-i), i — 1, 2, ... , 



(14) 
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be a sequence of non-negative, integer-valued random variables, generated by some 
measurable functions hi : R k — > Z. The above definition implies that Xi is indepen- 
dent of X\,...,Xi-k and X i+k , Therefore, X\,X 2l ... are "fc-dependent" random 

variables (independent random variables can be considered as 1-dcpendent). Naturally, 
the bound offered tends to 0, provided that P(Xi 7^ 0) =Pi ~ 0. Hence, the condition 
max, Yl}=i-3k+3 Pi < 1°§ 2 ~ 0.693 does not affect the generality of the result. We assume 
that Xi = for all i < 1. 

Theorem 11. Let X\,X 2 , ■ ■ ■ ,X n £ Z + be k-dependent random variables (defined as in 
(14)) with finite second moments. Let N\, . . . , N n be independent random variables (also 
independent of Zi) with N{ following the CP( P i,Gi) distribution, where Gi{x) = P(Xi < 
x\Xi ^ 0), i€l and pi = P(Xi 7^ 0). Then, for m := max,; X^=i-3fc+3 ft' < 1°S2, 



d TV (£j2 X » CP ( X ^ F ri 



i=l 



j=i-2/c+2 \j=i-2fe+2 




:= UB' CP , 



wht 



n / / i-3fe+2 i-3fc+2 \ i-2/c+l \ 

i=l V V j=l j=l / j=i-3fc+3 / 

x (2P((X i _ k+1 ,...,X i _ 1 )^O,X i ^0)+2 Pi ^ Pj +p?) 
and A„ = X)£=i2»t> i^n = Z)iLi xt G i- 

Proof. In order to simplify notation, set X a j, '■=Y2i= a Xi, X a ,b := (X a ,X a+ i, . . . ,Xb), 

N a .b--=E b l= a N * and Z a ,b--=(Z a ,Z a+1 ,...,Z b ). Also, let U % ,U*, i = l,2,...,n, be inde- 
pendent random variables, also independent of Zi,Ni following the uniform distribution 
on (0,1). 

Fix i e {1,2, . . . ,n}. In order to avoid a special treatment for small values of i due to 
edge effects and to preserve a unified analysis for all i that takes into account edge ef- 
fects, we simply assume that Xj = Nj = Zj = for j < 0. According to Lemma 8(b) 
(with / being the p.d.f. of Ni^sk^), there exists a random variable A r ^ i 3fc+2 = 
9l(Ui-ak+2,Xi t i-3k+2) such that CN* ti _ 3k+2 = CN x ,i-sk+2 and (X 1A - 3k +2, N*, isk+2) 
are maximally coupled, that is, 

c!Tv(^Ai il _3 fc+ 2,£A r 1 * l _3 A;+2 ) = P(Ai ii _ 3fc+ 2 ^ Nl :i _ 3k+2 ). 
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Moreover, according to Lemma 8(a) (with / now being the p.d.f. of iVj), there exists a 
random variable 

N* = g2{U i ,Xi,X.i-k+i,i-i, Zi-k+id-i) 

such that CN* = CNi, N* is independent of the vector (Ki-k+i,i--L^i-k+i,i-i) and 

d>Tv(£(Xi,'Ki-k+i,i-i , Z i _fc + i : j_i),£(A r *, Xj-k+i^-i, Zi-k+i,i-i)) = P{Xi ^N*). 

(15) 

It is easy to check that, as defined, N* is also independent of Xi Indeed, if we set 
Y : = (Z;_ fc+M _i,Xi_ fc+M _i), for all x,x, we have that 

P(N* = x, Xi,i_i = x) = P( N * =I iY = y, Xx.^x = x). 

y 

We may write Xi ; j_i = g(Zi ) £_fc,Y) for some appropriate function g taking values in 
7L l ~ x . Hence, the above sum is equal to 

^2 P(92(U* , X h Y) = x, Y = y, g(Z M _ fc , y) = x) 
y 

= P(92(U* ,X h Y)=x,Y = y)P(g(Z lji _ fe , y) = x) 
y 

= p ( N : = X )P( Y = y)P(g(Zi > i_ fej y) = x) 

y 

= p( n ? = x )p( y = y. g(Zi,i-fc, y) = x) 

y 

= P(/V* = *) ^ P(Y = y, X M _! = x) = P(7V; = x)P(X 1 , i . 1 = y), 
y 

which is valid for all x,y and thus TV* is independent of Xi^_i. 
Now, applying the inequality (see Lemma 4) 

d T y(C{Z + X),C(Z + Y)) < 2P(Z ^W,X^Y) + d TV (C(W + X),C(W + Y)) 

with Z = Xi ti -i,X = Xi + N i+hn ,Y= N* + N i+hn ,W = Nf^ ah+a + Xi-ak+a.i-i, we 
obtain 

d T v(£(*i,i-i + Xi + N i+hn ), CiXxt-.! +N*+ N l+hn )) 

< 2P{X x>i -x ^ Ni*i_ 3k+2 + X i _ 2k+2 ,i-i,X i + N i+hn ^N*+ N i+hn ) 

, , * ^ (16) 

£(Ki~3k+2 + X i _2k+2,i-l + N*+ N l+hn )). 
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Note that CN* = CN t and N* is independent of Xi,i_i, also that CN* j_ 3fc+2 = 
CNi ti - 3k+ 2 and A r 1 * ? _ 3fc+2 =5i(^i-3fc+2,^i,i-3fc+2) is independent of Xi_ 2 fc+2,i and N* . 
Therefore, we have that 

£(X M _j + N* + N l+hn ) = C{X hl ^ + Nt + N i+1 , n ), 

^■(^*,i-3k+2 +^i-2fe+2,j +-^Vj+l,n) = -C(^l,j-3fc+2 + ^i-2fe+2,i +^Vj+l,n)) 
£(-/V^ i _3fe +2 + Xj_2fc+2,i-l + Af* + ^Vj+i >n ) = £(A^i j? ;_3fc +2 + -2Q-2fc+2,i-l + -^i + ^i+l,n)- 

Using the above relations, inequality (16) is equivalent to 

#rv(£(X M _i + X + iVi+x.n),/:^,^! + iVi + JVi+i,„)) < 2a, + 6,, (17) 

where 

a. t = P(X l!i _ 2k+1 ? N* a _ 3k+2 ,X t + N*), 

bi = dTv(£(Nl,i-3k+2 + Xi-2k+2,i-l + Xi + N i+ i^ n ), 
C(N hi -3k+2 + Xi-2k+2,i-l +Ni + N i+ i in )). 

The random variables Xi, N* are independent of X±, . . . , Xi-2k+i,N* j_ 3fc+2 and, hence, 
it is easy to see that 

a i = P(Xl t i-3k+2 + Xi-3k+3,i-2k+l ^l,i-3k+2)P(^i ^ ^*) 

< (P(X M _ 3 * +2 + K^ 3k+2 ) + P(X i _ 3fc+3 , J _ 2fc+ i 0))P(Xi + N*) (18) 

(i-2k+l \ 
dTV (^Li-3fc+2,'CA r i,i-3H2 
)+ E Pj)P(Xi^N;). 
j=i-3k+3 ) 

Using relation (15) above along with £(Z i _ fe+ i i , i _i,X i _ fe+li . i _ 1 , N*) = C(Zi-k+i,i-i, 
Xj-fc+i^-i, Ni), we observe that 

P(Xi 7^ N*) = dTv(^( Z J-fe+i,j-i,X^_ fc+1 ^_ 1 ,X i ),£(Z i _ fc+ i )i _i,X i _ fe+lii _ 1 , -ZVf)) 

and applying Lemma 4 with X = X ; ,Y = A^, Z = (Z-_ fc+1 i _ 1 ,X i _ fe+1 j_i), W = 
(Zi_ fc+ i^_i,0), we deduce 

P(X^N*) 

< 2P{X l ^ ATi.Xi-fc+i.i-i ^0) + d rv (>C(Z i -fc+i,i-i,0,X i ) ) £(Z i _ fc+ i. i _i,0,Ar i )) 
<2P(Jf j ^O,X i _ fc+ll< _ 1 ^0) 

(19) 

+ 2P(A^ ^ 0)P(Xi_ fe+1 , i _ 1 ^ 0) + d T v(CXi,CNi) 

i-l 

<2P(X i ^0,Xi_ fe+ i, i _i^O)+2p i ^ Pj • /-:■ 
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Next, we consider a random variable N with CN^ = CNi, independent of all other ran- 
dom variables involved in our analysis. Applying the inequality (6) with W = Nisk+3,i 
and assuming that P(Ni-3k+3,i ^ 0) < 1/2 (which is valid since we have assumed that 
m < log 2), we get 

bi = rfxv(-£(-/Vl,i-3fc+2 + Aj_ 2 fc+2,j-l + Xi + N i+ i^ n ), 

C{N hi -z k+2 + X i _ 2 fc+2,i-i + N± + N i+hn )) 
< (1 ~2P(N t „ 3k+3 , l ^0)y 1 d TY (£(N hn + X^ 2k+2 , l ),£(N ltn +X i _ 2fc+2 , i _i + N±)). 

Finally, using Lemma 1, we derive 

bi < 1 /2||A 2 /cp(a,f)||i C2 (£X 4 _ 2fc+2li ,£(X 1 _ 2fc+2 , i _i + #.)). (20) 

l-2(l-e _ ^i=*-a*+3 Pi ) 

Combining (17)-(19) with (20), we obtain, for all z = 1, 2, . . . , n, the inequality 

d T v(£(*i,i-i + Xi + N i+hn ), £(JTi,i_i + iVi + JVi+i, n )) 

(i— 2fc+l \ 
<*rv(/^l,<-3*+2,/^Vl 1 »-3fc+2)+ Pi) 
j=j-3fe+3 / 

x ( 2P(X t + 0, X^ fc+ i,i-i ? 0) + 2p 4 ^ Pj + p? j 
V j=i-k+l ) 

1/2|1A 2 / cp(A:F) ||i 

1 - 2(1 — e _m ) L ' A i-2k+2,i, M Ai_2fc+2,i-l + JViJJ 

and the final result follows immediately by virtue of the Lindeberg decomposition (tri- 
angle inequality) 

n 

d TV (£X hn ,£N hn ) <J2d T v(C(X 1>i _ 1 + Xi + N i+hn ),£(X 1>i _ 1 + Ni + N i+1 , n )). □ 

i=l 

Remark 4- The upper bound UB' CP in Theorem 11 is composed of two terms, the first of 
which is the quantity C„, which is analogous to the term (£p 2 ) 2 appearing in Theorem 
9 that concerns the independent summands case. As it was for (Ep 2 ) 2 , the term C n 
tends to faster than the second term of UB' CP , under certain asymptotic conditions. 
Therefore, under these conditions, the order of UB' CP coincides with the order of the 
second term. 



Remark 5. If X\ , A 2 X n arc fc-dependent Bernoulli random variables then, sim- 
ilarly to Corollary 10, Theorem 11 implies a Poisson approximation result. Specifi- 
cally, Theorem 11 can now be written with Po(pi) in place of CP(pi,Gi) and Po(X n ) 
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in place of CP(X n ,F n ). Consequently, the norm ||A 2 /p ( Aji )||i will appear instead of 

l|A 2 /cP(A„,F„)l|l- 

The upper bound UB' CP in Theorem 11 may seem difficult to apply in its present form. 
For this reason, we present the following corollary which provides two slightly worse, but 
more easily computable, upper bounds. The bound (a) is valid without any assumption 
on the form of dependence among the X^s. The bound (b) is smaller than (a), but is 
valid only when the -XV s exhibit a certain weak form of positive/negative dependence. We 
recall that two random variables Y\,Y 2 are called positively quadrant dependent (PQD) 
if 

P(Y 1 >x 1 ,Y 2 >x 2 )>P(Y 1 >x 1 )P(Y 2 >x 2 ) for all zi,^ (21) 

and negatively quadrant dependent (NQD) if (21) holds, but with the inequality sign 
reversed. Manifestly, if X\,X 2 , . . . ,X n are associated (resp., negatively associated), then 
the random variables Xj + X 2 + ■ ■ ■ + X^i and Xi arc PQD (resp., NQD) for every 
1 j < i _■ n - Therefore, part (b) of the next corollary remains valid under the stronger 
condition of association or negative association of X^s. 

Corollary 12. (a) Let X\, X 2 , ■ ■ ■ ,X n S Z4. be k-dependent random variables (defined 
as in (14)) with finite second moments. Then, for m := maxi Y?j=i-3k+3 Pi < l°g2, Pi = 
P(X,^0), 



d TV [C^2Xi,CP(X n ,F n 



\ i=l / 

< On + ± [ E + WW + EMEM)) + 

'■= UB" CP , 

where 

n I t-3fc+2 / j-l 1 \ i-2k+l \ 

C„:=2^ 2 J2 E (P(X t X jJ L0)+p tPj ) + -p* + Yl Pi) 

t=l V j = l \t=j-k+l J j"=i-3fc+3 / 

x(2 (P(X j ^0,X i ^0)+ Pi p j )+p*\ 
\ j=i-k+i / 

and \ n = ElLift. F n = Eti £-Gi,Gi(x) = P(Xi < x\X t £ 0),x £ K (X, = for all 
i<l). 

(b) //, in addition, the random variables Xj + ■ • • + and Xi are PQD or NQD for 
every 1 < j < i < n, then the bound UB" CP in (a) is valid with \ Qov(Xi,Xj)\ in place of 



324 M. V. Boutsikas and E. Vaggelatou 

E{X i X j ) + E{X i )E{X j ) and 



3-1 1 i-1 

2~^'~J' "'" 1 ~ J / j \~ \~-"--J i "l 1 rt-rj/ • 2* 

t=j—k+l t=j-fc+l 



Proof, (a) This follows readily from Theorem 11 by applying Corollary 7 above, Corol- 
lary 7 in Boutsikas (2006) and the fact that 

& (E X i + > C (E X i + N ^j< C2{CXtXN l ) = \E(X t ) 2 

(Ni ~ CP(j>i, Gi)), which is a consequence of the regularity property of £2 combined with 
equality (8). 

(b) This is again a direct consequence of Theorem 11. Set W = 2}=i-2fc+2^j an< ^ 
let X^~ be a random variable independent of all other random variables involved in our 

analysis with CXi = CX^ . Assume that Xj -\ + and Xi are PQD for all j < i. 

Thus, W and Xj arc PQD and hence W + Xi is larger than W + X^~ with respect to the 
convex order (see Section 3.3 in Boutsikas and Vaggelatou (2002)). Therefore, 

1 4-1 
C 2 (C(W + X t ),£(W + X^)) = -(VM(W + X l )-V i ir(W + X^))= ^ Cov(X t ,X 3 ). 

j=i-k+i 

Since W is independent of X^-,Ni, the regularity property of £2 and equality (8) guar- 
antee that 

C2{C{W + Xt)X(W + N l )) < C 2 (jCX^,jCNi) = \E(X t ) 2 . 

Hence, using the triangle inequality and the above two equalities, we deduce that 

i-i 

( 2 (£(W + Xi),C(W + N,))< Cov(X;,X i ) + -£(X i ) 2 . 

j=i~k+i 

Furthermore, from (3) and Theorem 7 in Boutsikas and Vaggelatou (2002), we get that 

(i-3fc+2 \ 
-3fe+2, Fi-Sk+2) 

(i-3k+2 \ 
-3k+2, Fi-3k+2) 

i-3k+2 t—1 i-3fe+2 

= 2 £ Yl Cov(X j ,X t )+ E(X 3 ) 2 - 
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Similar reasoning proves the NQD random variables case (in place of all Cov(Xj,X t ), 
we now get — Cov(Xj ,X t )>0). □ 

5. Illustrating applications 

The purpose of this section is to illustrate the applicability and effectiveness of the re- 
sults presented in the previous sections. These results are applicable to a wide variety of 
problems involving locally dependent random variables that rarely differ from zero (for 
example, in risk theory, extreme value theory, reliability theory, run and scan statistics, 
graph theory and biomolccular sequence analysis) . The approximation method described 
in this paper, as with almost all other methods used for Poisson approximation in the 
past, requires the computation of only the first- and second-order moments of the vari- 
ables involved. From this fact, it is understood that the bounds presented can be applied 
almost directly to many of the problems where other Poisson approximation methods 
have been elaborated in the past, for example, the Stein-Chen method. The main ben- 
efit of the present method is the smoothness factor that substantially improves the ap- 
proximation error bound in many cases, while the main disadvantage is the additional 
term C n . Therefore, the conclusion here is that we usually obtain improved bounds for 
moderate or small values of A. 

5.1. The number of overlapping runs of length k in i.i.d. trials 

Let {Zi}j 6 z be a sequence of i.i.d. binary trials with outcomes (failure) and 1 (success), 
and where P(Zi = 1) = p= 1 — q. We are interested in approximating the distribution 
of the number of (rare) overlapping success runs of length k within trials 1,2, ... ,n. 
This problem has been studied in various ways by many authors in the past; see, for 
example, Barbour, Hoist and Janson (f992), Balakrishnan and Koutras (2002) and the 
references therein. We shall first derive a Poisson and then a compound Poisson approx- 
imation. 

(a) Poisson approximation. If we assume that p— > and n — > oo, then the occurrences 
of success runs are rare and asymptotically independent, and a Poisson approximation 
seems suitable. We use the binary random variables 

Xi = ZiZ i+1 ■ ■ ■ Z i+ k-i, i = l,2,...,n — k + 1. 

Obviously, the random variable Y^—i +1 Xi counts the total number of appearances of 
overlapping success runs with length k which appear within the first n trials. The random 
variables Xi,X%, . . . ,X n -k+i are fc-dependent, can be written as in (14) and are associ- 
ated as coordinatcwise non-decreasing functions of independent random variables. Thus, 
they satisfy the dependence condition required by Corollary 12(b). A direct application 
of this corollary for m = (3fc — 2)p k < log 2 yields 
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2(1 -e" 



i=i 

/n-k+l i-1 r l -k+^ 

x E E (f-^- P ^) + ^±± P ^ 

\ i=2 j=max{l,i-fc+l} / 

< Cn-k+l H ; ; — 7T 1 — K — 2 H — on 

" 71 k+1 2q(l-2(l-C-» 1 )) V V ?/ 

where pi = P(Xi = 1) =p k ,\ = (n — k + l)p k and 

n-k+l / i-3fc+2 t-1 i-3fc+2 

c„_ fc+1 = 2 E 2 E E ( P ^ +k -P 2k )+ E ? 2fe +(*-i)/ 

i=l V 4=2 j=max{l,*-fc+l} j=l 

2fe \ 



x 2 ^ p I - J+fc + (2/c-l)p : 

\ j=max{l,i— fc+1} / 



<4— — l-p^-g fc-Tr b 



2r a 



Therefore, for m = (3k — 2)p k < log 2, 



/ n-k+l N 



! = 1 



< KB nj , := 4 



g V A / 2g(l-2(l-e-' 



2g(l-2(l-e" m ))' 
In addition, if n— ► oo,p— >• (k > 1 hxed), then 

A!|A 2 /p o( A)||i 



UB nf 



-p. when A is fixed, 
2q 

2 „ 

p, when A — > oo and pX 0, 



<7V 27re 



where ||A 2 /p ( A )||i (which is less than 4(1 A Jj-)) is given in Proposition 3. For the same 
distance, a bound obtained by the Stein-Chen method (see, for example, Barbour, Hoist 
and Janson (1992), page 163) is nearly equal to 2p/q, which, provided that pA 2 ~ and 
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for moderate or large values of A, is nearly four times larger than UB n:P ( %/ 27te « 4.1327). 
For Aw 1, it is nearly three times larger. 

(b) Compound Poisson approximation. The bound described in (a) cannot help when 
we assume that n — > oo , k — > oo and p is fixed. Under these conditions, the occurrences 
of success runs are again rare, but they are no longer asymptotically independent. This 
happens because if a success run occurs (starts) at trial i (that is, Zi = ■ ■ ■ = Z i+ k-i = 1), 
then, with probability p, we shall also observe an overlapping success run starting at 
position i + 1, and so forth. Thus, when a success run is observed at some trial, it is likely 
that a number of success runs will follow at the next trials. This "cluster" of adjacent 
success runs is usually called a "clump" . So, now that n — > oo and k — > oo, we expect that 
the occurrences of clumps are rare and asymptotically independent, while each clump 
consists of an asymptotically geometrically distributed number of overlapping success 
runs. Obviously, this situation readily calls for a compound Poisson approximation result. 
To achieve this, let Yi, Y2, ■ • ■ , Y n -k+i represent the sizes of the clumps started at trials 
1,2, ...,n — + respectively. If Yi = 0, then we obviously mean that no clump has 
started at position i. This well-known technique is called "dcclumping" . More formally, 
set 

n — i— fe+1 j+fe+r— 1 

:= (1 - Zi_i) Z 3> t = 2,3,...,n-fc+l, and 

r=0 j—i 

n—k k-\-r 
r=0 j=l 

to be the size of a clump starting at position i (that is, the number of adjacent overlapping 
success runs until trial n). Clearly, ^^i^ 1 Yi is equal to Y^i=i +1 -^*> ^ nc total number 
of overlapping success runs within trials 1,2, ...,n. In this case, it is computationally 
more convenient to use the stationary, locally dependent random variables 

fc-l i+fc+r-l 

y/:=(l-^-i)Z II Z * i = l,2,...,n-fe + l, 

r— j=i 

which represent the truncated sizes of clumps (their sizes cannot be greater than k) 
starting at positions 1,2, ...,n — k + 1. In order to obtain stationarity, we have also 
allowed the last clumps to extend further than trial n. When k, n increase so that the 
expected number of runs (n— fc + l)p fc remains bounded, the processes Y = (Yj), Y' = (Y() 
rarely differ. This is expressed by the following inequality (see Boutsikas (2006), page 
511): 

d TV (£(Y),£(Y')) < P(Y ^ Y) < (n - 2k + l)qp 2k + 2p k+1 . (22) 

We can now use Corollary 12(a) to establish an upper bound for g?tv(£(X) ^7)j CP). We 
verify that the random variables Y{ ', Y%, . ■ ■ , Y^_ k+1 € Z + can be written as in (14) and 
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that they are also 2fc-dependent. Obviously, pi = P{Y( ^ 0) = qp k . For m = (6k — 2)qp k < 
log2, Corollary 12(a) yields the inequality 

■ i-), ) IU 



d TV [C J2 Y i> CP(\F k ) < C n ^ k+1 



2(l-2(l-e" m )) 

n— fc+1 / i— 1 



X 

1=1 \j=i-2fc+l / 

(we assume that Y- = for i < 1 ) with 

n— fc+l / i-6fc+2 j — 1 i-6fe+2 \ 

c n _ fc+1 <2 E 2 E £ (p(y/y; ^ o) + ( g /) 2 ) + £ ( g /) 2 + 2^ 

1 = 1 V j=2 t=j-2fe+l j=l / 

x ( 2 E P(Y^0,Y'^Q)+4k(qp k )A 

\ j=i-2k+l ) 

and \=(n-k+ l)qp k , F k (x) = P(Y( < x\Y( ^0),iel. Notice that, for i > 2k, P(Y[ ^ 
0, Yj ^= 0) is now equal to q 2 p 2k for j = i — 2k + 1, . . . , i — k — 1, while it vanishes when 
j = i-k,...,i-l. Moreover, £(F/) = q£ k r Zlp k+r = p k (l-p k ), i = l,2,...,n- fc + l, 
whereas (i > 2fc) 

/ /j+k-l j+k i-2 \ \ 

E(T j Y!) = E (1-^-0 II Z 1 + HZ 1+ +H Zl \y!\ 

V V i=j ;=j / / 

= qp k l P 3 — #(*7) = p 2fc (l-i> i_i_fc )(l-2> fc ), i-2fe + l<j<i-fc-l 



and E(Y'Y() = for i - k < j < i - 1. So, for i > 2fc, we get 



i-l 



E wno+^'W)) 

3=i-2fe+l 

= p 2fc (l -/) (7 ft - l-p 1 ?^ 1 ) + (2ft - 1)(1 - p fe ) ) < p 2fe (3fc - 2) 



1-p 

and, thus, for m = (6k — 2)qp k < log 2, 

(n-fe+l \ 
£ £ ^',CP(A,F fc )J 



2 W , I^IIAVcpca^IIiA^, 



(23) 



< KB„, fc := ( 1 + - j (6Afc g? /) 2 + , 1 _ 2( 7_ v ;:r ) " ~(6fc - 3)p fc , 
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where A = (n — k + l)qp k and, for x = 1, 2, . . . , k — 1 
F k {x)=P{Yl<x\Yl^Q)=P\l 




is the geometric distribution truncated at k (F k (k) = 1). It can be verified that for large A, 
CP(X,F k ) ps N(XE(W), \E{W 2 )) with W ~ F fc and, according to Remark 1, we expect 
that 

2 4 
II A fcp<\ Fi 1 11 1 ~ ; asA^oo. 

In order to illustrate the above asymptotic relation, we present below a table with 
the exact value of the norm || A 2 fcp(\,F k ) II i an d its approximation 4/ (XE(W 2 )\/2nc) for 
several values of X,p (see Table 1). We assume that k — > oo, that is, F k is the ordinary 
geometric distribution and thus E(W) = 1/q, V{W) ~p/q 2 and E(W 2 ) = (1 +p)/q 2 . 

As expected, the above approximation is satisfactory for moderate and large values 
of A. Moreover, we observe that it becomes better when p decreases. Assuming that 
n, k — > oo with p £ (0, 1) fixed, the compound Poisson approximation error bound in (23) 
is of order 



UB n 




^\\A 2 f C p(\ t F k )\\i-6kp k , when A = (n - k + l)qp k is fixed, 



(1 +p)V27TC 



kp k , when A — > oo, such that X 2 kp — > 0. 



For almost the same distance as in (23), the Stein-Chen method offers a bound UBcs 
such that 

UBcs- ^^-^h kp* whenp<l or UB cs ~-\kp k when p < \ 
q z (l — 2p) 3 1 — bp 5 

(see, for example, Barbour and Chryssaphinou (2001)). Note that for values of p > 1/3, 
the Stein-Chen method yields bounds of order 0(kp k +e" afcA ) or 0(Xkp k ). The UB n>k 
is smaller provided that X 2 kp k ps and is of order 0(kp k ) for all values of p. 



Table 1. 







A= 1 




A = 5 




A = 10 




A = 100 








norm 


approx. 


norm 


approx. 


norm 


approx. 


norm 


approx. 


p = 
p = 
p = 


0.2 
0.5 
0.8 


0.97120 
1.10364 
1.32437 


0.516204 
0.161314 
0.021509 


0.115414 
0.040737 
0.019508 


0.103241 
0.032263 
0.004302 


0.054341 
0.017866 
0.002474 


0.051620 
0.016131 
0.002151 


0.005189 
0.001628 
0.000218 


0.005162 
0.001613 
0.000215 
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It is worth mentioning that here, instead of Corollary 12(a), we could employ Corollary 
12(b) to obtain a bound even better than UB n ^. Specifically, it can be proven that for 

every 1 < j < i < n, the random variables Y- H h Y i '_ 1 and Y( arc NQD. Hence, we can 

use Corollary 12(b) and, following an exact parallel to the above procedure, we derive 
the improved bound 

which, asymptotically, is about three times smaller than UB n ^- 

Finally, we can approximate Y]^—i +1 Yj = SiLi ^ ne total number of overlap- 

ping success runs within trials 1,2, ...,n, by CP(X,G), where G denotes the ordinary 
geometric distribution with parameter p. In this case, CP(X,G) is also known as the 
Polya-Aeppli distribution with parameters X,p and will be denoted by PA{X 1 p). Using 
the triangle inequality, the distance d^\{CY^Zi +1 Xi, PA(X,p)) is bounded above by 

(n-k+l \ / n-fe+1 \ 

C Y ^ C E Yi)+d TV U Yl Yl,CP(X,F k ))+d TV (CP(X,F k ),PA(X,p)). 
i=l t=l / \ i=l / 

The first c?tv is bounded by (22), the second bounded by (23), whereas for the third, we 
have (Wi,Ui are independent random variables with Wi ~ Fj~ and Ui ~ G) 

(N N \ 

J2Wi,J2uA< Xdrv(Wi,Ui) = Xp k . 
i=l i=l / 
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